Skip to content

[AMD] retrigger dsv4-fp4-mi355x-atom benchmark sweep#1817

Merged
Oseltamivir merged 34 commits into
mainfrom
amd/retrigger-dsv4-atom-sweep
Jun 18, 2026
Merged

[AMD] retrigger dsv4-fp4-mi355x-atom benchmark sweep#1817
Oseltamivir merged 34 commits into
mainfrom
amd/retrigger-dsv4-atom-sweep

Conversation

@Oseltamivir

@Oseltamivir Oseltamivir commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Summary

Validation

  • Parsed perf-changelog.yaml with PyYAML.
  • Confirmed the diff is append-only and file mode remains 100644.

Note

Low Risk
Append-only changelog metadata for CI orchestration; no runtime or config logic changes.

Overview
Retriggers the dsv4-fp4-mi355x-atom benchmark sweep by appending a new block at the end of perf-changelog.yaml. Sweep selection is driven by the changelog diff vs main, so a fresh append is enough to kick off another run.

The new entry is a duplicate of the existing PR #1717 changelog (same config-keys, description bullets, and pr-link). It does not change .github/configs, launch scripts, or search-space YAML—only documents the re-run intent for reviewers.

The described benchmark context (unchanged by this PR) is DeepSeek-V4 FP4 on MI355X ATOM: image rocm/atom:…atom0.1.4_20260612, ISL=8192 search-space updates (TP8 conc 4–64, DPA conc 128–1024), and TBO at high concurrency.

Reviewed by Cursor Bugbot for commit dbc4d69. Bugbot is set up for automated code reviews on this repo. Configure here.

seungrokj and others added 30 commits June 12, 2026 21:38
…e image to atom0.1.4

- Enable --enable-tbo for ISL=1024/OSL=1024 at CONC>=1024 and ISL=8192/OSL=1024 at CONC>=256
- Update image to atom0.1.4_20260612
- Update ISL=8192 search-space to start at conc=4 and use DPA from conc=128

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…onc range

- Pass --max-model-len to server using SERVE_MAX_MODEL_LEN
- Add EVAL_ONLY path: compute eval context length via compute_eval_context_length
- Extend conc-end to 8192 (isl=1024) and 4096 (isl=8192) in amd-master.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sable max-model-len

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…m-seqs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e, enable max-model-len

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ap conc ranges

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ace description

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… remove stale entries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…x mem frac

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…add TP4/TP8 conc lists for ISL8192

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
seungrokj and others added 3 commits June 17, 2026 17:21
…nch override

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…/TP8/DPA

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@Oseltamivir Oseltamivir marked this pull request as ready for review June 18, 2026 01:06
@Oseltamivir Oseltamivir requested a review from a team June 18, 2026 01:06
@claude

claude Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Claude finished @Oseltamivir's task in 35s —— View job


Review Complete

  • Gather context and read PR diff
  • Validate perf-changelog.yaml changes
  • Post review summary

LGTM - no blocking issues found.

The new entry is correctly appended at the end of perf-changelog.yaml (lines 3939–3945), preserving chronological ordering. The diff is append-only and re-triggers the dsv4-fp4-mi355x-atom sweep as intended.

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@Oseltamivir

Copy link
Copy Markdown
Collaborator Author

/reuse-sweep-run 27676739575

@Oseltamivir Oseltamivir merged commit 2cd1d01 into main Jun 18, 2026
26 checks passed
@Oseltamivir Oseltamivir deleted the amd/retrigger-dsv4-atom-sweep branch June 18, 2026 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants